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Abstract 

In subfamily Suaedoideae, four independent gains of C 4 photosynthesis are proposed, which includes two paral- 
lel origins of Kranz anatomy (sections Salsina and Schoberia) and two independent origins of single-cell C 4 anat- 
omy (Bienertia and Suaeda aralocaspica). Additional phylogenetic support for this hypothesis was generated from 
sequence data of the C-terminal portion of the phosphoenolpyruvate carboxylase (PEPC) gene used in C 4 photo- 
synthesis {ppc-1) in combination with previous sequence data, ppc-1 sequence was generated for 20 species in 
Suaedoideae and two outgroup Salsola species that included all types of C 4 anatomies as well as two types of C 3 
anatomies. A branch-site test for positively selected codons was performed using the software package PAML. From 
labelling of the four branches where C 4 is hypothesized to have developed (foreground branches), residue 733 (maize 
numbering) was identified to be under positive selection with a posterior probability >0.99 and residue 868 at the >0.95 
interval using Bayes empirical Bayes (BEB). When labelling all the branches within C 4 clades, the branch-site test 
identified 13 codons to be under selection with a posterior probability >0.95 by BEB; this is discussed considering cur- 
rent information on functional residues. The signature C 4 substitution of an alanine for a serine at position 780 in the 
C-terminal end (which is considered a major determinant of affinity for PEP) was only found in four of the C 4 species 
sampled, while eight of the C 4 species and all the C 3 species have an alanine residue; indicating that this substitution 
is not a requirement for C 4 function. 

Keywords: Bienertia, C 4 photosynthesis, PAML, phosphoenolpyruvate carboxylase, positive selection analysis, Suaeda 
aralocaspica, Suaedoideae. 



Introduction 

Phosphoenolpyruvate carboxylase (PEPC) (EC 4.1.1.31) plays 
an important biochemical role in higher plants by converting 
bicarbonate (HC0 3 ~) and phosphoenolpyruvate (PEP), in the 
presence of Mg 2+ or Mn 2+ , into the four-carbon acid oxaloac- 
etate (OAA) and Pi (O'Leary, 1982; Chollet et ctl, 1996; Izui 
et al, 2004). OAA is readily reduced into the more stable 
product malate or transaminated to aspartate. Plants that 
have high levels of PEPC protein in their leaves to generate 



a pool of aspartate or malate as intermediate products of 
photosynthesis are known as C 4 or CAM (Crassulacean acid 
metabolism) species, as opposed to C 3 species that use PEPC 
primarily in an anaplerotic role. C 4 and CAM plants subse- 
quently de-carboxylate the pool of C 4 acids, distally or tem- 
porally, respectively, to increase the concentration of C0 2 
around Rubisco. Thus, all plant genomes encode several 
paralogues of PEPC, with only one orthologue being used in 



Abbreviations: dN, non-synonmous substitution rate; dS, synonymous substitution rate; G6R glucose 6-phosphate; LRT, likelihood ratio test; ML, maximum likelihood; 
OAA, oxaloacetate; PEP, phosphoenolpyruvate; PEPC, phosphoenolpyruvate carboxylase; co, the non-synonymous/synonymous substitution rate ratio. 
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the C 4 pathway (Hermans and Westhoff, 1990; Lepiniec et al. , 
1994; Chollet et al, 1996; Svensson et al, 2003; Gowik et al, 
2006; Rao et al, 2006; Christin et al, 2007, 2011; Besnard 
et al, 2009). The function of non-C 4 PEPC paralogues has 
been summarized recently (O'Leary et al, 2011); they have 
a role in many plant functions, such as the regulation of sto- 
matal movement, seed development, seed germination, root 
excretion for abiotic stress acclimation, energy production, 
carbon storage, nitrogen fixation, and an anaplerotic role in 
the citric acid cycle. Given the physiological importance of 
PEPC for C 4 photosynthesis, numerous studies have shown 
that PEPC used for C 4 biochemistry has distinct kinetic dif- 
ferences in comparison with orthologous genes (Ting and 
Osmond, 1973; Biasing et al, 2000, 2002; Gowik et al, 2006; 
Lara et al, 2006; Jacobs et al, 2008; Rao et al, 2008). The 
exact amino acid residues that are responsible for the various 
observed kinetic difference is still being resolved in order to 
explain further how PEPC kinetics impact the flux of C0 2 
assimilation through the C 4 pathway under a wide range of 
changing conditions. 

PEPC in vascular plants is functional as a homodimer 
of dimers composed of subunits with a molecular mass of 
95-11 6 kDa, and is allosterically regulated by the metabolic 
context of the enzyme (Kai et al , 2003). At physiological pH, 
PEPC is activated by it substrate Mg-PEP (Tovar-Mendez 
et al, 1998), phosphorylated sugars such as glucose 6-phos- 
phate (G6P) (Coombs et al, 1973; Wong and Davies, 1973), 
and neutral amino acids such as glycine, alanine, and serine 
(Nishikido and Takanashi, 1973; Bandarian et al, 1992; 
Tovar-Mendez et al, 2000). Dicarboxylic acids such as the 
downstream products malate and aspartate negatively inhibit 
PEPC activity (Huber and Edwards, 1975). Enzyme activ- 
ity can also be modified by phosphorylation on a conserved 
N-terminal serine residue, which causes a decrease in affin- 
ity for dicarboxylic acids and an increase in affinity for PEP 
(Jiao and Chollet, 1988; Tovar-Mendez et al, 1998). A pro- 
tein crystal structure for PEPC from Escherichia coli, Zea 
mays (C 4 ), Flaveria pringlei (C 3 ), and F. trinervia (C 4 ) has 
been resolved to help facilitate understanding of the relation- 
ship of amino acid substitutions to PEPC kinetics (Kai et al. , 
1999; Matsumura et al, 2002; Paulus et al, 2013). Identifying 
amino acids in the PEPC protein that are under the most selec- 
tive pressure after being recruited for use in C 4 biochemistry 
will further elucidate what changes can potentially enhance 
C 4 photosynthesis, potential metabolic limitations, and the 
regulatory network underlying plant adaptation. 

The appearance of C 4 biochemistry has occurred at least 
62 independent times in angiosperms (36 lineages in eudicots, 
six in the sedges, and 18 in the grasses), making it one of the 
most common convergent processes studied to date (Sage 
et al, 2011). Among dicot families, the Chenopodiaceae has 
the largest number of C 4 species with the greatest diversity 
in leaf anatomy, with Kranz anatomy and single-cell C 4 spe- 
cies as well as C 3 species (Kadereit et al, 2003; Edwards and 
Voznesenskaya, 2011). All of the Chenopodiaceae C 4 genera 
except Atriplex are in the Salicorniodeae/Suaedoideae/Salso 
loideae/Camphorosmoideae (Kadereit et al, 2003; Kadereit 
and Freitag, 2011). The three different types of C 3 leaf 



anatomy within the genus Suaeda are characterized accord- 
ing to their sections: Brezia, Vera, and Schanginia (Schutze 
et al, 2003). There are hypothesized to be four independent 
origins of C 4 photosynthesis within the Suaedoideae, includ- 
ing two separate origins of distinctive Kranz C 4 anatomies, 
in Suaeda sections Salsina sensu lato (s. I. ) and Schoberia, and 
two independent origins of unique single-cell C 4 anatomy, in 
Bienertia and in Suaeda aralocaspica (Kapralov et al, 2006). 

Nomenclature for PEPC in higher plants is varied 
throughout the literature, an artifact that is attributable 
to the numerous independent characterizations of PEPC 
genes, independent gene and genome duplication events that 
occurred after species divergence, as well as non-standardized 
nomenclature. In the grasses, there are four PEPC genes that 
have been predominantly characterized, with the PEPC gene 
that is most often recruited for use in C 4 photosynthesis being 
named initially ppc-C 4 and subsequently ppc-B2 (Christin 
et al, 2007). In the sedges, there are five PEPC genes that 
have been predominantly characterized, with the gene being 
recruited for use in C 4 being labelled ppc-1 (Besnard et al, 
2009). In Arabidopsis there are four PEPC genes that were 
arbitrarily numbered 1-4 (Sanchez and Cejudo, 2003). One 
of the earliest dicot C 4 PEPC genes analysed was in Flaveria 
(Asteraceae), where three genes were identified and labelled 
A, B, and C, with the ppc-A gene being identified as the gene 
recruited for use in the C 4 photosyntehic pathway (Hermans 
and Westhoff, 1990; Engelmann et al, 2003). Alphabetical 
nomenclature for PEPC genes was subsequently used in 
Alternthera (Amaranthaceae) where the three characterized 
PEPC genes were phylogenetically sister to the Flaveria PEPC 
genes (Gowik et al, 2006). More recent phylogenetic analy- 
sis shows that there are two paralogous eudicot PEPC genes, 
with the one most often being recruited for use in the C 4 pho- 
tosynthetic pathway being labelled ppc-1, with the exception 
of Flaveria where the apparent loss of the ppc-1 gene required 
the recruitment of a twice duplicated ppc-2 gene (Christin 
et al, 2011). Here the nomenclature of Christin et al. (2011) 
is followed, since this is to date the most detailed phylogenetic 
study of eudicot PEPC genes, even though only two of the 
3-5 eudicot PEPC genes are presented. 

Genes from closely related species tend to have a high amino 
acid homology, and in the case of PEPC, ~10% (or ~100 resi- 
dues) are invariant across all PEPC genes (Nakamura et al, 
1995). Orthologous protein variation is the result of changes 
to the nucleotide sequence that alter the codon and resulting 
amino acid. Changes to the coding sequence of a gene are 
often classified as being either a non-synonmous (dN) substi- 
tution that alters the resulting amino acid, or a synonymous 
(dS) substitution that changes the codon but does not change 
the amino acid. The direction of selective pressure at each 
amino acid residue is determined by comparing the rates of 
dN and dS substitutions across orthologous proteins, usually 
expressed as the ratio of dN to dS (co=dN/dS). Equal rates of 
both types of substitutions (co=l) suggest neutral evolution or 
low selective pressure for a specific amino acid at that residue. 
A low number of dN substitutions relative to the number of 
dS substitutions (co<l) indicates purifying selection pressure 
against changes to the amino acid present. A high number of 
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dN substitutions relative to dS substitutions (to>l) suggests 
that the new amino acid present offers some fitness advan- 
tage probably associated with adaptive change (Yang and 
Nielsen, 2002; Huelsenbeck and Dyer, 2004). At the molecu- 
lar level, there are functional constraints across the protein, 
so the type of selective pressure at each residue is generally 
purifying. In most proteins, a functional difference is often 
the result of positive selection at only a few sites (Yang et al. , 
2005; Christin et al. , 2008; Kapralov et al. , 2012). Phylogenetic 
analysis in the grasses showed that ppc-B2 was recruited eight 
independent times for use in the C 4 pathway (Christin et al. , 
2007). During this switch to C 4 , 21 amino acids evolved under 
positive selection and converged to similar or identical amino 
acids, some of which have also been recorded in non-grass C 4 
species (Biasing et al, 2000; Gowik et al., 2006; Christin et al, 
2007, 2011; Besnard et al, 2009). Such convergence at some 
sites, such as serine at position 780, appears to reflect the need 
for specific amino acid residues for C 4 function, whereas at 
other sites there appears to be a requirement for loss of the 
C 3 -associated amino acid. These substitutions are thought to 
optimize PEPC for C 4 photosynthetic function. 

The specific aims of this study are to: (i) find additional 
phylogenetic support for the four origins of C 4 photosynthe- 
sis in Suaedoideae by using PEPC sequence data; (ii) identify 
which PEPC paralogous gene is being recruited for use in C 4 
photosynthesis in comparison with other previously charac- 
terized PEPC genes; (iii) identify any amino acids under posi- 
tive selection in the PEPC gene used in C 4 photosynthesis; 
and (iv) determine the spatial location of positively selected 
amino acids in relation to catalytic and allosteric regulatory 
amino acids. 



Materials and methods 

Plant material 

All plants used in this study were started from seed and were grown in 
controlled environmental chambers (Econair GC-16; Bio Chambers). 
Seedlings were started under lower light [100 photosynthetic photon 
flux density (PPFD; umol quanta nr 2 s~')] and temperature condi- 
tions with a day/night temperature of 25/22 °C and a photoperiod 
of 14/10h, and then moved to high light and temperature conditions 
(1000 PPFD, with a day/night temperature of 35/25 °C and a pho- 
toperiod of 14/10h) once plants were well established. A few leaves 
from 2- to 6-month-old plants were used for DNA extraction. 

DNA sequencing 

The PEPC gene ppc-1 was sequenced for two Bienertia species and 
18 Suaeda species, with two Salsola species sequenced as outgroups 
for the selection analyses. The PEPC gene ppc-2 was sequenced for 
12 Suaeda species. DNA was extracted from 250mg of plant mate- 
rial using the CTAB (cetyltrimethylammonium bromide) method 
following the protocol of Doyle and Doyle (1987). Primers were 
developed to similar regions previously analysed for positive selec- 
tion in order to amplify exons 8-10 (Supplementary Table SI avail- 
able at JXB online). Initial PCR conditions were 2min at 95 °C, 
followed by 35 cycles of: 30 s at 95 °C, a 30 s 52 °C annealing step, 
and a 3 min extension at 72 °C. The PCR product was visualized and 
purified using a PCR clean-up kit according to the manufacturer's 
protocol (Qiagen, USA). Purified PCR product was cloned into 
pGEM T-easy vector using the manufacturer's protocol (Promega, 



USA). Single colonies were grown overnight and plasmid DNA 
was purified using alkaline lysis with SDS (Sambrook and Russell, 
2001). Plasmid inserts were PCR amplified using GOTaq (Promega, 
USA), and Sp6 and T7 primers, and were visualized on a gel. Prior 
to sequencing, the PCR product was mixed with 2.5 U of antarctic 
phosphatase and 4U of exo-sap nuclease in antarctic phosphatase 
buffer (New England Biosciences, USA) to degrade primers and 
nucleotides, and subsequently diluted 1:10. Sequencing reactions 
were performed using the Big Dye terminator master mix v3.1 
(Applied Biosciences, USA), using sequence-specific internal prim- 
ers along with Sp6 and T7 (Supplementary Table SI). Sequencing 
was carried out by Operon Sequences (USA) and at Washington 
State University genomics core. Sequence data were assembled using 
Sequencher software (USA). Nucleotide sequences were translated, 
aligned, and visualized using Se-Al and MacVector (USA). All 
sequences were deposited in GenBank (Supplementary Table S2). 

Phylogenetic analyses 

Three DNA sequence data matrices were analysed in this study: (i) the 
previously published matrix of the nuclear ribosomal internal tran- 
scribed spacer (ITS), along with the chloroplast intergenic spacers of 
atpB-rbcL, and psbB-psbH irom Kapralov et al. (2006) (Supplementary 
Table S3 at JXB online), combined with the third nucleotide from each 
codon and introns for ppc-1 from the subsample of the species sam- 
pled; (ii) ppc-1 and ppc-2 coding sequence from Suaedoideae sequenced 
here and other eudicot sequences from GenBank; and (ii) ppc-1 third 
position sites and introns of the suaedoids sequenced in this study with 
two ppc-1 Salsola samples used as the outgroup. Alignments of PEPC 
genes and their introns were conducted with MUSCLE (Edgar, 2004), 
and visually inspected using Se-Al. Maximum likelihood (ML) analy- 
ses were performed using the GTRGAMMA model in RAxML ver- 
sion 7.2.8 (Stamatakis, 2006; Silvestro and Michalak, 2012) with 1000 
bootstraps using multithreading on four cores. 



Positive selection analysis 

Positive selection analysis used the ML tree from the ppc-1 suaedoid 
data set analysis described above, consisting of eight C 3 species, nine 
C 4 Kranz species, and three single-cell C 4 species, of which Bienertia 
sinuspersicihad two ppc-1 accessions. Two Salsola species (one C 3 and 
one C 3 -C 4 intermediate) were used as outgroups. To test for positive 
selection at particular sites of the ppc-1 gene, the codeml program in 
the PAML v4.4 package was used to perform likelihood ratio tests 
(LRTs) to identify the best model for codon change while concur- 
rently identifying dN amino acid changes that are under positive 
selection (Yang, 2007). Each model adds additional parameters to 
try and fit the data better by assuming similar to values either across 
all of the phylogeny (site test) or on pre-specifled C 4 branches only 
(branch-site test). A comparison of LRT scores shows which model 
fits the data the best. Details of each model are as follows. Model 
M0 allows for a single co value across the whole phylogenetic tree at 
all sites. Subsequent models allow co to vary at different sites. Model 
Mia (nearly neutral) allows for two rates of co to vary between 0 and 
1, while Model M2a (positive selection) is the same as Model Mia 
but allows for an additional rate of co to be >1. Model M8a assumes 
a discrete beta distribution for to, which is constrained between 0 
and 1 including a class with co=l, similar to Model M8 which allow 
the same distribution as M8a but has an extra class under positive 
selection with co>l. Branch-site tests, using pre-specified branches 
where changes associated with C 4 photosynthesis are hypothesized 
to have occurred (foreground branches), were made with the null 
Model Al . This allows co ratios to vary among sites and among line- 
ages, and it also provides two additional classes of codons with co=l 
along pre-specified foreground branches, while restricting co to be 
<1, on background branches. The alternative Model A allows co to 
vary between 0 and 1, be equal to 1 for all branches, and also has 
two additional classes of codons under positive selection with co>l 
along pre-specified foreground branches while restricting co to either 
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0-1 or co=l on background branches. C 4 lineages were marked as 
foreground branches. 

For all LRTs, the null model is a simplified version of the selec- 
tion model, with fewer parameters, and is thus expected to provide a 
poorer fit to the data (lower maximum likelihood). The null models 
(Mia, M8a, and Al) do not allow codons with to>l, whereas the 
selection models (M2a, M8, and A) are alternative models that allow 
for codons with co> 1 . The significance of the LRTs was calculated 
assuming that twice the difference in the log of maximum likelihood 
between the two models was distributed as a x 2 distribution with 
the degrees of freedom (df) given by the difference in the number 
of parameters in the two types of models (Yang and Nielsen, 2002; 
Yang and Swanson, 2002). For the Mla-M2a comparison df=2, and 
for M8a-M8 and Al-A comparisons df=l. Each LRT was run at 
least twice using different initial omega values to test for suboptimal 
local peaks. To identify amino acid sites potentially under positive 
selection, the parameter estimates from M2a, M8, and A models 
were used to calculate the posterior probabilities that an amino acid 
belongs to a class with cd>1, using the Bayes empirical Bayes (BEB) 
approaches implemented in PAML (Yang et al, 2005). 



Structural analysis of PEPC 

The recently published PEPC (ppc-2) protein structure from 
the C 4 species F. trinervia (Asteraceae) (Paulus et al, 2013) was 
obtained from the RCSB Protein data bank (www.rcsb.org, 3ZGB). 
Throughout this paper, the numbering of PEPC residues is based 
on the maize ppc-B2 sequence CAA33317 (Besnard et al, 2003) for 
easy comparison with previous studies. The properties, locations, 
and spatial relationships of individual amino acids in the PEPC 
structure were analysed using CUPSAT and PyMOL (Schrodinger; 
Parthiban et al, 2006). Figures of the atomic structures and dis- 
tances measurements were made with PyMOL, and formatted using 
Adobe Photoshop CS5 (USA). 



Results 

Phylogenetic analyses 

Analysis of third position sites and introns of ppc-1, in 
combination with previous ITS, atpB-rbcL, and the psbB- 
H sequence data, provided additional support for relation- 
ships in Suaedoideae showing four independent origins of 
C 4 . The ML phylogenetic tree (Fig. 1) for 46 Suaedoideae 
species and four outgroup species is in agreement with previ- 
ously obtained phylogenies for this subfamily (Schiitze et al. , 
2003; Kapralov et al, 2006). Previous phylogenetic analy- 
ses of relationships within Suaeda provided a strongly sup- 
ported phylogenetic hypothesis of relationships in the clade, 
with the exception of two branches: the branch grouping 
Schoberia+ Alexandra to Physophora and the branch grouping 
Schanginia+ Borszczowia to Suaeda both had bootstrap sup- 
port <50% (Kapralov et al. , 2006). With the addition of ppc-1 
data to the previously analyzed data sets, a similar but gener- 
ally much more strongly supported phylogeny suggests a grade 
of Suaeda, Schanginia+ Borszczowia, Schoberia+ Alexandra, 
Physophora, and Salsina clades (Fig. 1). Among the clades, 
the only branch grouping major clades that does not have 
>90% bootstrap support is the Physophora+ Salsina branch 
(bootstrap support=69%). Despite this, these results continue 
to support strongly four independent gains of C 4 photosyn- 
thesis within the Suaedoideae, including two parallel origins 
of distinctive Kranz C 4 anatomy in Suaeda sections Salsina 



s.l and Schoberia, and two independent origins of unique 
single -cell C 4 anatomy in Bienertia and Suaeda aralocaspica. 

The phylogenetic relationships of eudicot PEPC genes, 
using exons 8, 9, and 10, rooted on the monocot ppc-B2 maize 
gene, shows that the ppc-1 gene that was recruited for use in 
C 4 photosynthesis in Suaedoideae is the same orthologous 
gene that has been previously shown to be recruited for use 
in C 4 photosynthesis in other C 4 eudicot species (Fig. 2). The 
exception to this parallel recruitment of the same PEPC gene 
is found in the Asteraceae where the paralogous ppc-2 gene is 
recruited for use in C 4 photosynthesis. There is strong boot- 
strap support for all Suaedoideae ppc-1 genes being sister to 
closely related Amaranthaceae ppc-1 genes. 

Positive selection analyses of the PEPC coding 
sequence 

The sequenced region of ppc-1 and ppc-2 includes exons 8, 
9, and 10 and accounts for 51 1 of the 973 (53%) amino acids 
present in the PEPC protein. Comparison of the ppc-1 gene 
across the 22 species showed that 372 of these 511 (73%) 
amino acids are conserved. The same sequenced region for 
the ppc-2 gene, for 12 of the 22 sampled species, showed that 
449 of the 506 (88%) amino acids are conserved. 

A phylogenetic tree, generated from intron and third posi- 
tion sites, was used to obtain a supported tree that is mini- 
mally affected by selective pressures (Supplementary Fig. SI 
at JXB online). The tree had 23 tips, which included 22 sam- 
pled species and an additional tip for the variation found at 
residue 780 in the B. sinuspersici ppc-1 gene (denoted FAW 
for an alanine residue, or FSW for a serine residue present 
at position 780). Identification of codons under positive 
selection was performed using the software package PAML, 
which provided a likelihood score for each model, that was 
subsequently used to test for significance between the null 
and selection model. LRT showed that the site models assum- 
ing positive selection (M2a and M8) did not fit the data better 
than models without positive selection (Mia and M8a), with 
a P-value of 0.5273 and 0.0192, respectively (Table 1). 

To test if codon selection occurs specifically in C 4 clades, 
the branch site Model A, which allows for co>l along fore- 
ground branches (branches where C 4 -specific changes were 
hypothesized to occur), was compared with the null Model 
Al, which only allows for co<l along foreground and back- 
ground branches. This model comparison was made by label- 
ling foreground branches in four different ways: (i) only those 
foreground branches leading to C 4 clades; (ii) only those fore- 
ground branches leading to Kranz C 4 clades; (iii) only those 
branches leading to single-cell C 4 clades; or (iv) by labelling 
all branches within C 4 clades. Comparing the selection Model 
A with the null Model Al showed that labelling of just the 
branches leading to C 4 clades was the most significant, with a 
P-value <0.0001 (Table 1). Labelling just the branches lead- 
ing to Kranz C 4 clades, as well as all branches within all C 4 
clades, both produced significant results; P- value 0.0091 and 
0.0242, respectively (Table 1). There was no variation in the 
likelihood score for the selection model compared with the 
null model when labelling just foreground branches leading 
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Fig. 1. Suaedoideae phylogeny using ITS, atp-rbcL, psbB-psbH, and ppcl third position plus intron sequence. Number above branches refer to 
bootstrap percentages. Clades leading to C 4 photosynthesis are highlighted in red. Taxa used for positive selection analysis are indicated with an asterisk. 
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to single-cell C 4 clades (Table 1). There was some variation 
in sites identified under selection depending on how the fore- 
ground branches were labelled. 

Sites under positive selection 

There were no codons identified as being under positive selec- 
tion with a posterior probability >0.95 by BEB in the M2A 
or M8 model (Table 1). There were two codons (position 733 
and 868) that were shown to be under positive selection with a 
posterior probability >0.95 by BEB when only branches lead- 
ing to C 4 clades were labelled as foreground branches, with 



position 733 being the only residue identified to have a pos- 
terior probability >0.99 by BEB in Model A (Table 1). Both 
sites had four alternative amino acids present in this data set, 
an amino acid present mostly in C 3 species, and one of three 
amino acids present in C 4 species (Table 2). Only 733 had a 
substitution present in all the C 4 species sampled from the 
C 3 amino acid, while 868 had a substitution in all C 4 species 
sampled except the two Bienertia species. Model A identified 
three codons (485, 519, and 735) that were shown to be under 
positive selection with a posterior probability >0.95 by BEB, 
when only branches leading to Kranz C 4 clades were labelled 
as foreground branches (Table 1). Substitutions at codons 
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Fig. 2. Eudicot PEPC exon 8, 9, and 1 0 phylogeny rooted on the monocot ppc-B2 maize gene. Accessions numbers are indicated after species names 
for sequences retrieved from GenESank. Paralogous PEPC genes (ppc-1 and ppc-2) are delimited on the right. Eudicot families are indicated on the right. 
Branches leading to C 4 clades are in red. Numbers above branches are the bootstrap percentages. 
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Table 1 . Comparison of modelled amino acid model change in the Suaedoideae ppc-1 gene, to identify sites under positive selection 



Model with positive selection* 1 



Null model 3 



LRT b P-value 



Model Log-likelihood Parameters' 7 



Positively selected 
sites" 



Model Log-likelihood Parameters' 7 



2L 



Analysis for positively selected sites common for C 3 and C 4 clades 
M2a 



-6741 .7 



None 



M8 



Analysis 
A 



Analysis 
A 



Analysis 
A 



Analysis 
A 



k=2.94, 
Po=0.82, 
co 0 =0.07, 
p s =0.003, 
co s =3.24 

-6740.28 k=2.89, None 

Po=0.98, 
p=0.24, 
q=1.0, 
co s =1 .93 

for positive selection along branches leading to C 4 clades 
-6729.78 k=2.95, 733, 868 

Po=0.81 , 
to 0 =0.08, 
p s =0.04, 
co s =4.2 

for positive selection along branches leading to Kranz C 4 clades 
-6732.25 k=2.94, 485,519,735 

Po=0.77, 
co 0 =0.06, 
p s =0.06, 
co s =3.64 

for positive selection along branches leading to single-cell C 4 clades 



M1a -6742.34 



M8a -6743.02 



A1 



-6737.24 



A1 



-6735.67 



-6742.34 k=2.92, 
Po=0.82, 
co 0 =0.07, 
p s =0.00, 
co s =NA 

for positive selection along all C 4 branches 
-6702.46 k=2.91, 
Po=0.74, 
co 0 =0.05, 
p s =0.18, 
co s =1 .5 



None 



A1 



-6742.34 



480, 513, 627, 662, 
695, 707, 733, 744, 
794, 863, 868, 880, 
931 



A1 



-6705.34 



K=2.92, 

Po=0.82, 

(o n =0.07 



K=2.86, 
Po=0.96, 
p=0.19, 
q=0.7 



K=2.89, 

Po=0.77, 

u)o=0.06 



K=2.91, 

Po=0.71, 

co 0 =0.06 



K=2.92, 

Po=0.82, 

(o n =0.07 



K=2.86, 

p 0 =0.69, 

u)o=0.04 



1 .28 0.5273 



5.48 0.0192 



14.9 0.0001 



6.8 0.0091 



0 1 



5.08 0.0242 



a M1a (nearly neutral), M2a (positive selection), M8a (beta and co=1), and M8 (beta and a>) are PAML site models; A1 and A are PAML branch- 
site models. 

b LRT is the likelihood ratio test; 2L is twice the difference of model log-likelihoods. 

c k is the transition/transversion rate ratio; co is the dN/dS ratio; co s is the dN/dS ratio in a class under putative positive selection; p 0 and p s are 
the proportion of codons with cd<1 and co>1 , respectively; p and q are parameters of beta distribution in the range (0, 1). 
6 Sites listed are those at which positive selection is detected at the significance level of >95%, or >99% in bold italics. 



485 and 735 had two possible amino acids present in the data 
set, and all species in the Salsina clade had a substitution to 
the C 4 amino acid (Table 2; Supplementary Table S4 at JXB 
online). Additionally, codon 519 also had two amino acids 
present in the data set, and a substitution to the C 4 amino 
acid was only present in the two Schloberia sampled species 
(Table 2; Supplementary Table S4). There were no codons 
identified as being under positive selection that was specific 
to branches leading to single-cell C 4 clades (Table 1). By label- 
ling all C 4 branches as foreground branches, 13 codons were 
identified as being under positive selection (480, 513, 627, 
662, 695, 707, 733, 744, 794, 863, 868, 880, and 931) with a 



posterior probability >0.95 by BEB (Table 1). Two of these 
residues (733 and 868) were identified as being on branches 
leading to C 4 clades. 

The amount of substitution at a particular residue for a C 4 
amino acid from a C 3 amino acid varies quite considerably 
across the data set. Residues which have a low frequency of 
substitutions are found at positions 695 and 931 where only 
the two Bienertia species have a difference in the amino acid 
present, while at positions 662 and 744 only three or four 
Kranz C 4 species, respectively, have a different amino acid 
present (Table 2; Supplementary Table S4 at JXB online). 
Residues that have a high frequency of substitutions, but not 
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Table 2. Characteristics of amino acid replacements under positive selection in the C 4 lineages of Suaedoideae 



AAno. a 


AA change 


Type of 
change 6 




AP" 


AV e 


SA f (%) 


AG 9 (kJ 
mol ') 


No. of C3/ 
no. of C 4 h 


No. of 
transitions 

lOI MM 


No. of 
transver- 

oiuiio iur 

c 4 aa 


Location 
of residue 


480 


D^E 


D- 




0.0 


-0.7 


27.3 


92.7 


S (0.15) 


0/11 


— 


1 


a-Helix 18 


485' 


K— >A 


R- 


■>A 


5.7 


-3.2 


-80.0 


40.3 


DS (-1.12) 


0/7 


1 


1 


a-Helix 19 


513 


D^A 


D- 


■>A 


5.3 


-4.9 


-22.5 


18.7 


DS (-1.17) 


0/9 


— 


1 


a-Helix 20 


51 9* 


H^K 


R- 




-0.7 


0.9 


15.4 


46.7 


S (2.04) 


0/2 


— 


2 


a-Helix 20 


627 


V->l 


A- 




0.3 


-0.7 


26.7 


10.3 


S (1 .4) 


0/9 


1 


- 


a-Helix 25 


662 


D^E 


D- 




0.0 


-0.7 


27.3 


45.1 


DS (-1.11) 


0/3 


— 


1 


Loop 


695 


T^V 


P- 




4.9 


-2.7 


23.9 


0.7 


DS (-4.39) 


0/1 


2 


— 


a-Helix 28 




T-»l 


P- 


■>A 


5.2 


-3.4 


50.6 


0.7 


DS (-0.97) 


0/1 


1 




a-Helix 28 


707 


l->M 


A- 


-»A 


-2.6 


0.5 


-3.8 


47.6 


DS (-0.64) 


0/1 


1 




Loop 




I— *L 


A- 


■>A 


-0.7 


-0.3 


0.0 


47.6 


DS (-1 .55) 


0/2 


— 


1 


Loop 




l->S 


A- 


^P 


-5.3 


4.0 


-77.7 


47.6 


S (0.38) 


0/1 


1 


— 


Loop 




l-»T 


A- 




-5.2 


3.4 


-50.6 


47.6 


DS (-0.3) 


0/1 


1 


— 


Loop 


733 


F^M 


F- 


^A 


-0.9 


0.5 


-27.0 


39.9 


DS (-3.73) 


0/2 


1 


1 


Loop 




F^L 


F- 


+A 


1.0 


-0.3 


-23.2 


39.9 


DS (-3.14) 


0/8 


— 


1 


Loop 




F^R 


F- 


+R 


-7.3 


5.3 


-16.5 


39.9 


DS (-2.42) 


0/2 


1 


1 


Loop 


735'' 


E^N 


D- 


^P 


0.0 


-0.7 


-24.3 


48.7 


DS (-0.76) 


0/7 


1 


1 


Loop 


744 


L^C 


A- 


^P 


-1.3 


0.6 


-58.2 


29.5 


DS (-2.63) 


0/2 


— 


2 


a-Helix 30 




L^R 


A- 


^R 


-8.3 


5.6 


6.7 


29.5 


DS (-4.14) 


0/2 


— 


1 


a-Helix 30 


780 


A^S 


A- 


^P 


-2.6 


1.1 


0.4 


0.0 


DS (-3.1) 


0/4 


— 


1 


a-Helix 32 


794 


F->l 


F- 


+A 


1.7 


0.0 


-23.2 


0.0 


DS (-2.03) 


0/6 


— 


1 


a-Helix 34 




F^M 


F- 


+A 


-0.9 


0.5 


-27.0 


0.0 


DS (-2.51) 


0/2 


1 


1 


a-Helix 34 




F^L 


F- 


+A 


1.0 


-0.3 


-23.2 


0.0 


DS (-0.87) 


0/1 


— 


1 


a-Helix 34 


863 


S^K 


A- 


^R 


0.6 


2.1 


79.6 


15.7 


DS (-0.22) 


0/1 


1 


1 


a-Helix 38 




S^D 


^_ 




-2.7 


3.8 


22.1 


15.7 


S (0.07) 


0/1 


2 


— 


a-Helix 38 




S^N 


A- 


^P 


-2.7 


2.4 


5.1 


15.7 


S (2.2) 


0/9 


1 


— 


a-Helix 38 




S^T 


P- 


^P 


0.1 


-0.6 


27.1 




DS (-0.43) 


0/1 


— 


1 


a-Helix 38 


868 


K^R 


R- 


^R 


-0.6 


-0.8 


4.8 


16.5 


DS (-0.58) 


0/2 


1 




a-Helix 38 




K— >Q 


R- 


^P 


0.4 


—0.8 


—24.8 


1 6.5 


Uo (— O.U/) 


U/1 




1 


1 1^1:., on 

a-Helix 38 




K^L 


R- 


^A 


7.7 


-3.4 


-1.9 


16.5 


DS (-0.88) 


0/7 




2 


a-Helix 38 


880 


D^N 


D- 


^P 


0.0 


-1.4 


3.0 


46.5 


DS (-1 .49) 


0/5 


1 




Loop 




D^E 


D- 




0.0 


-0.7 


27.3 


46.5 


DS (-0.22) 


0/1 




1 


Loop 




D^Y 


D- 


^A 


2.2 


-6.8 


82.5 


46.5 


DS (-3.79) 


0/1 




1 


Loop 


931 


M-»l 


A- 


^A 


2.6 


-0.5 


3.8 






0/2 


1 




a-Helix 39 



3 Amino acid (AA) numbering is based on the maize sequence after Hudspeth and Grula (1989). 

b Side chain type changes: A, non-polar aliphatic; P, polar uncharged; D, polar negatively charged; R, polar positively charged. 
c Hydropathicity difference (Kyte and Doolittle, 1982). 
d Polarity difference (Grantham, 1974). 

8 van der Waals volume difference (Zamyatin, 1 972). 

f Solvent accessibility calculated using the Flaveria trinervia ppc-2 structure (pdb file 3ZGE) by CUPSAT (Parthiban etal. 2006). 

9 Overall stability of the protein predicted using the F. trinervia ppc-2 structure (pdb file 3ZGE) by CUPSAT (Parthiban et al. , 2006): DS, 
destabilizing; S, stabilizing. 

h Number of C 3 or C 4 Suaedoideae species that have the indicated amino acid substitution (amino acid on right side of arrow). 
' Specific for Salsina Kranz anatomy. 
' Specific for Schoberia Kranz anatomy. 



in all species, are found at positions 480, 513, 627, 794, and 
863 (Table 2; Supplementary Table S4). Additionally, sites 
under selection show differences in the degree of parallel- 
isms. For example, residues at positions 480, 485, 513, 627, 
and 735 all changed to an identical amino acid along differ- 
ent ppc-1 lineages, while residues at positions 707, 733, 794, 
863, 868, and 880 changed to different amino acids (Table 2). 
This suggests that the C 4 characteristics might be conferred 
by a change to a specific amino acid or by the absence of a 
particular amino acid. 



Spatial analysis of the sites identified to be under posi- 
tive selection indicates that most identified residues are 
found around the enzyme reaction site/hydrophobic pocket 
(Fig. 3). Residues close to the G6P-binding pocket were 
not analysed in this study since they are present in exons 
2, 3, and 4. Some residues (794, 863, 868, and 880) under 
positive selection are in the vicinity of the allosteric site for 
aspartate and malate regulation (Fig. 3). Two residues (480 
and 485) identified to be under positive selection are far 
from the reaction site and are most probably involved in the 
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Fig. 3. Cartoon representation of the C 4 PEPC enzyme structure of Flaveria trinervia ippc-2 gene) (Paulus et al. , 201 3), showing the location of functional 
residues in comparison with sites that were identified to be under positive selection in Suaedoideae. Green residues are sites under positive selection, 
biochemically essential residues are in a shade of red (histone 1 77 is bright red, Mg 2+ -binding sites are ruby, PEP- and HC0 3 "-binding sites are dark 
red, Lys606 is brown), allosteric regulatory sites are in a shade of blue (deep-blue residues are glucose 6-phosphate-binding sites, light blue are malate/ 
aspartate-binding sites), Gly890 is orange, and Ser780 is yellow (maize numbering). Residue 733 is indicated with a white arrow. The proposed residue 
function is adapted from Kai ef al. (2003). 



dimer-dimer interaction. Interestingly, no sites involved in 
the p-barrel structure were identified to be under selection, 
meaning all sites were located in an a-helix or a connecting 
loop (Table 2). 

Discussion 

Recruitment of the ppc-1 gene in four gains of C 4 
photosynthesis in Suaedoideae 

Extensive phylogenetic analyses on the relationships of 
higher plant PEPC genes has been previously carried out in 
families Cyperaceae, Poaceae, and Molluginaceae to deter- 
mine which PEPC gene is recruited for use in C 4 biochemistry 
(Christin et al. , 2007, 20 1 1 ; Besnard et al. , 2009; Christin and 
Besnard, 2009). In the monocots, the same ppc-B2 orthologue 
is recruited for each independent gain of C 4 photosynthesis, 
except for Stipagrostis where the ppc-B2 orthologue is lost 
and ppc-aLlb is instead used for C 4 photosynthesis (Christin 
and Besnard, 2009). In the core eudicots, there are two pri- 
mary PEPC gene lineages that have been studied to date: ppc- 
1 and ppc-2 (Fig. 2) (Christin et al., 2011). Results from this 
study show that C 4 Suaedoideae species (Chenopodiaceae) all 
recruit the orthologous ppc-1 gene for C 4 use (Fig. 2), as has 
been found in Amaranthaceae and Molluginaceae (Gowik 
et al, 2006; Christin et al., 2011). This differs from Flaveria 



(Asteraceae) where the paralogous ppc-2 gene was recruited 
(Hermans and Westhoff, 1992; Westhoff and Gowik, 2004). 

Selection for PEPC amino acid residues in C 4 species 

In Suaedoideae, there are four independent ppc-1 lineages 
that have a number of non-synonomyous substitutions 
(Table 1; Fig. 2). This is analogous to previous positive 
selection analyses on PEPC that showed there are impor- 
tant adaptive changes to the PEPC sequences in C 4 species 
(Christin et al., 2007, 2012; Besnard et al., 2009). What is 
divergent about these changes are the residues at which the 
changes have occurred. Of the 15 amino acids identified to 
be under positive selection in this analysis (Table 1), only 
residue 733 was previously identified to be under selection in 
both the grasses and sedges, while residues 794 and 863 were 
identified to be under selection in the grasses (Christin et al. , 
2007; Besnard et al., 2009). This variation in what amino 
acids are under selection may be attributed to the recruitment 
of different PEPC paralogues with different starting amino 
acid sequences. However, when looking at paralogous PEPC 
genes that were recruited for use in C 4 photosynthesis in the 
grasses, parallel changes for eight of the 21 codons previously 
shown to be under positive selection in C 4 grasses (ppc-B2 
gene) also had the same amino acid substitution on the par- 
alogous ppc-aLlb branch of the C 4 Stipagrostis (positions 
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517, 531, 572, 579, 625, 665, 733, and 780) (Christin and 
Besnard, 2009). The positive selection of ppc-1 amino acid 
residues in C 4 Suaedoideae species suggests that there is some 
fitness advantage associated with these changes. The lack of 
parallel substitutions between monocots and Suaedoideae 
might also be due to structural and ecological differences 
associated with optimization for C 4 function. For C 4 func- 
tion there needs to be spatial separation between the site of 
atmospheric C0 2 capture and the site of decarboxylation in 
the C 4 cycle, which provides resistance to leakage and allows 
C0 2 to be concentrated and assimilated by Rubisco. There 
are major differences in anatomy and structural differences 
in how this is achieved among the four independent origins 
of C 4 in Suaedoideae (and in comparison with monocots), 
namely in the arrangement of the cytoplasmic compartments 
of the two types of single-cell C 4 species, and in the position 
of organelles in bundle sheath cells between the two Kranz 
lineages (Schutze et al, 2003; Edwards and Voznesenskaya, 
2011). These are succulent halophytes living in semi-arid 
deserts where high temperature, limited water, and saline soils 
could all contribute to C0 2 limitations on photosynthesis 
where C 4 would be beneficial. The evolution of C 4 photosyn- 
thesis in Chenopodiaceae was promoted by adaptation of 
species to dry and/or saline habitats (Kadereit et al, 2012). 
Further kinetic analyses of PEPC in C 4 lineages are needed 
to characterize differences that may be linked to optimization 
for C 4 function. Thus the exact PEPC amino acid modifica- 
tions that are necessary for optimized C 4 biochemical kinetics 
seem to vary across deeply divergent C 4 origins. 

With respect to PEPC function, there have been several resi- 
dues which, when mutated individually, are clearly shown to be 
essential for enzyme catalytic function (reviewed in Kai et al. , 
2003). However, substitution for amino acids along the PEPC 
sequence that either has no effect, or actually improves enzyme 
kinetics for function in C 4 , is hard to determine experimentally. 
Thus, it is difficult to know if all the observed changes in both 
Suaedoideae C 4 ppc-1 genes and previously analysed PEPC 
genes in the grasses and sedges are absolutely necessary, or act 
in a synergistic way, to improve the enzyme function for C 4 bio- 
chemistry. Closely related species that use the same PEPC gene 
for C 4 photosynthesis probably have an optimal molecular path 
for amino acid changes that is different from that of distantly 
related species that recruited a different PEPC gene (Besnard 
et al, 2009). With all of the family- specific PEPC adaptive 
changes that alter PEPC kinetics for use in C 4 photosynthesis, 
residue 733 is the only codon that underwent a similar change 
in the sedges, grasses, and Suaedoideae (Supplementary Table 
S4 at JXB online). There seems to be no requirement for a spe- 
cific residue at this location, as four different amino acid sub- 
stitutions are observed in C 4 species (Table 2; Supplementary 
Table S4). All of the substitutions at position 733 are from the 
bulky C 3 phenylalanine to the less bulky amino acids methio- 
nine, leucine, or arginine in Suaedoideae, or, comparatively, a 
valine substitution in the grasses and sedges. Substitution at 733 
in Suaedoideae was observed in all C 4 species sampled, while 
none of the C 3 species had this substitution. This substitution 
is also observed in some C 4 species such as Amaranthus hypo- 
chondriacus, Cleome gynandra, and some grasses and sedges, 



but not all C 4 species (Supplementary Table S4). Single amino 
acid substitutions of PEPC have been recently shown to have 
dramatic effects on enzyme kinetics (Paulus et al., 2013). While 
no analysis has specifically analysed the effect of a substitution 
at residue 733, it is in close proximity (>4 A) to a lysine residue 
that is conserved across higher plant PEPC genes (606 maize 
numbering/600 Flaveria numbering) (Supplementary Fig. S2 
at JXB online). Lysine (606/600) is proposed to be involved in 
substrate binding through mutation analysis that showed that 
when Lys606 is mutated to an arginine or threonine, the K m for 
HC0 3 increased up to 9-fold, but there was a minimal effect 
on the overall maximal velocity (F max ) (Gao and Woo, 1995). 
The exact function of Lys606 is not known since the residue is 
not required for enzyme activity, but when mutated becomes 
less active at physiological pH and is more inhibited by malate 
(Gao and Woo, 1995). Closer analysis of the phenylalanine 
substitution at 733 shows that every substitution is to a smaller 
amino acid (Table 2), that increases the space between Lys606 
and changes the solvent-exposed surface (Supplementary Fig. 
S2). If Lys606 is involved in HC0 3 or, to a lesser extent, PEP 
binding, as has been proposed (Kai et al, 2003), then by mak- 
ing it more accessible by substituting out phenylalanine, the K m 
for HC0 3 , and possibly for PEP may increase. Substitution at 
733 could thus be beneficial to C 4 plants, as this could increase 
the rate of HC0 3 utilization and potentially C 4 acid genera- 
tion, which indicates that detailed kinetic analyses are needed. 

Two amino acid positions have been described to be ben- 
eficial for C 4 function by increasing the efficiency of PEPC 
by substitutions at position 890 and 780 (Biasing et al, 
2000; Paulus et al, 2013). Studies on PEPC in Flaveria 
show that substitution of arginine with glycine at position 
890 reduces the affinity for malate and aspartate which are 
inhibitors of PEPC. Studies also show that substitution 
of alanine by a serine at position 780 lowers the affinity 
(raises the K m ) for PEP, meaning it would take higher lev- 
els of PEP to saturate the enzyme, which allows for higher 
concentrations of PEP to accumulate (Biasing et al, 2000). 
Neither one of these sites was identified to be under posi- 
tive selection in Suaedoideae (Table 1). Both Schoberia 
Kranz species sampled along with the two Bienertia spe- 
cies had a serine at position 780 (Supplementary Table S4 
at JXB online). While none of the C 4 species sampled had 
a glycine at residue 890, all the Salsinia Kranz C 4 species 
as well as three C 3 species had a methionine at this posi- 
tion (Supplementary Table S4). Conversely, there was posi- 
tive selection for residues 880, 868, and 863, in order of 
proximity to 890, respectively (Fig. 3). These residues are 
close to the proposed site of inhibition by C 4 acids aspar- 
tate and malate, with a substitution at residues 868 and 
880 being present in the majority but not all C 4 species 
(Supplementary Table S4). This lack of parallel amino acid 
conversion in Suaedoideae C 4 species indicates either that 
these substitutions are not necessary for effective function 
of C 4 due to compensating adjustments in C 4 biochemistry, 
or growth conditions, or that alternative amino acid substi- 
tutions can fulfil these functions. These results also suggest 
that C 4 may be able to function effectively with minimal 
changes in orthologous PEPC genes. 
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The results from variations in the branch-site test (Table 1), 
labelling all the Suaedoideae C 4 branches or just the branches 
where C 4 is thought to evolve, suggests there may be an order 
of selection or stronger selective pressures on residues at posi- 
tion 733 and 868 as noted above. At residue 733, all 12 C 4 
species had a substitution for something other than phenylala- 
nine. However, as noted above, there is not strong selection for 
serine at position 780 (lacking in eight C 4 species). This raises 
the question of how selective pressures on separate PEPC 
gene lineages may change if one mutation can affect the selec- 
tive pressure on subsequent mutations in the same functional 
area of the enzyme. For example, if residue 733 mutates from 
the bulky phenylalanine to a less bulky amino acid (potentially 
increasing the affinity for HC0 3 ), before a substitution at 780 
to serine (which may decrease the affinity for PEP), does this 
lower the selective pressure for a subsequent substitution at 
position 780? Reciprocally, would a substitution at residue 780 
to serine lower the selective pressure for fixation of a mutation 
at residue 733, possibly explaining what is observed in F. trin- 
ervia, Alternanthera pungens, and Mollugo cerviana where a 
substitution at residue 733 is absent but there is a serine at resi- 
due 780? Thus, kinetic analysis for the different PEPC forms 
to determine how positive selection at specific codons affects 
the kinetic properties and response to allosteric effectors com- 
pared with the C 3 orthologues is needed. 

The substitution of serine for alanine at position 780 has 
been observed in almost every C 4 species analysed to date 
(Biasing et al , 2000; Christin et al , 2007, 20 1 1 ; Besnard et al. , 
2009; Christin and Besnard, 2009). This serine substitution at 
position 780 has also been observed in the paralogous ppc- 
Bl gene for the C 4 species Centropodia forskaliiis (Christin 
et al, 2007), although its not clear if this gene is being co- 
expressed with the ppc-B2 gene or why there would be selec- 
tion for this residue. However, Hydrilla verticulata (inducible 
aquatic single-cell C 4 system) is an example of a PEPC gene 
that participates in C 4 biochemistry but lacks the C 4 signature 
serine — having the characteristic C 3 alanine. Surprisingly, in 
vitro kinetic analysis showed that the serine was not essen- 
tial for C 4 -like kinetics, and in fact substitution of serine for 
alanine in HVPEPC4 (the product of the PEPC gene that is 
proposed to function in the C 4 cycle) was detrimental in terms 
of reduced V m!iK and K cM values, although the same (serine 
for alanine) substitution in the anaplerotic form, HVPEPC3, 
altered the kinetics to become more C 4 like (Rao et al, , 2008). 

The results of the current study showing that only four out 
of 12 C 4 species sampled (Table 2; Supplementary Table S4 
at JXB online) have a serine at position 780, indicating that 
species can perform C 4 photosynthesis with an alanine at 
position 780. Bienertia sinuspersici is the second C 4 plant after 
Mollugo cerviana (fragilis group) that has been identified to 
have what appears to be a recent gene duplication of the ppc-1 
gene, with selection acting on only one of the two gene cop- 
ies (Christin et al, 2011). Whether both are expressed is not 
known, but if selection occurs on only one of the two paral- 
ogues it may shed light on regulation of C 4 gene expression. 
Furthermore, Mollugo cerviana {cerviana group) populations 
that lacked a substitution for a serine residue at position 780 
in the ppc-1 gene were suggested not to be fully optimized for 



C 4 biochemistry, although its carbon isotope composition is 
typical of c4 plants. Physiological analyses of species from the 
four C 4 clades in Suaedoideae, which have diversity in the form 
of PEPC at position 780, indicate that they are functionally C 4 
(Smith et al, 2009; King et al, 2012). This indicates that the 
C4-ness of a species cannot be determined by Ser780 alone. 

Finding C 4 species without a Ser780 would not be the first 
time that experimental evidence for PEPC caused a re -eval- 
uation of how enzyme kinetics are modified and the context 
of in vivo regulation. For example, PEPC has a conserved 
N-terminal serine that is subject to reversible phosphoryla- 
tion in response to light and has been determined to reduce 
the inhibitory effects of malate in vitro (Tsuchida et al , 2001). 
When this residue is not phosphorylated in vivo, there are no 
observable effects on C0 2 assimilation rates in transgenic 
F. bidentis, raising the question that if phosphorylation is not 
essential for efficient C 4 photosynthesis how biochemically is 
it related (Furumoto et al, 2007). PEPCs from both single- 
cell C 4 types and S. eltonica were shown to undergo phos- 
phorylation in the light, at this conserved N-terminal serine 
analogous to other C 4 systems, while the C 3 plant 5*. linifolia 
did not (Lara et al, 2006). This suggests there is the poten- 
tial for tolerating accumulation of high levels of malate dur- 
ing photosynthesis in these C 4 plants, but further analysis is 
needed to understand whether this is biochemically necessary 
and the context to in vivo levels of metabolites. This is analo- 
gous to the question of what effect substitution at position 
780 in PEPC has on kinetics in vitro, and whether this can 
be observed to be beneficial under certain conditions in vivo. 



Conclusion 

To date there are only a few reports on the recruitment of and 
modifications to PEPC along lineages that evolved C 4 in the 
eudicots. In Suaedoideae, the ppc-1 gene is used in C 4 photo- 
synthesis as observed in C 4 eudicot families Amaranthaceae 
and Molluginaceae, which is analogous to the predominant 
recruitment of the same PEPC orthologue in C 4 grasses and 
sedges. Unlike in the monocots, there is less evidence for 
the necessity for a high number of positively selected PEPC 
residues in Suaedoideae. Further analysis is needed to deter- 
mine if the observed amino acid differences in Suaedoideae 
are more or less common across the Chenopodiaceae, their 
effect on PEPC kinetic properties, and ultimately how these 
changes are beneficial for C 4 photosynthesis. 



Supplementary data 

Supplementary data are available at JXB online. 

Figure SI. Suaedoideae phylogeny, using only the ppcl 
third position plus intron sequence, that was used for positive 
selection analysis. 

Figure S2. Cartoon representation of the Q-PEPC enzyme 
structure of Flaveria trinervia (ppc-2 gene) (Paulus et al, 
2013), showing the spatial effect of a substitution at residue 
733 (maize numbering) in relation to residue 606. 
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Table SI. Name, sequence of primer, and which species the 
primer was used for in sequencing Suaedoideae ppc-1 and 
ppc-2 genes. 

Table S2. List of species origin, voucher, and ppc sequence 
accession numbers generated in this study. 

Table S3. Chenopodioideae species list used in phyloge- 
netic analyses with marker accession numbers. 

Table S4. Comparison of ppc-1 exon 8, 9, and 10 amino 
acids, that were identified to be under positive selection in 
Suaedoideae, across Eudicot families. 
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